Wikipedia workload analysis for decentralized hosting
نویسندگان
چکیده
We study an access trace containing a sample of Wikipedia’s traffic over a 107-day period aiming to identify appropriate replication and distribution strategies in a fully decentralized hosting environment. We perform a global analysis of the whole trace, and a detailed analysis of the requests directed to the English edition of Wikipedia. In our study, we classify client requests and examine aspects such as the number of read and save operations, significant load variations and requests for nonexisting pages. We also review proposed decentralized wiki architectures and discuss how they would handle Wikipedia’s workload. We conclude that decentralized architectures must focus on applying techniques to efficiently handle read operations while maintaining consistency and dealing with typical issues on decentralized systems such as churn, unbalanced loads and malicious participat-
منابع مشابه
Wikipedia Workload Analysis
We study an access trace containing a sample of Wikipedia’s traffic over a 107-day period. We perform a global analysis of the whole trace, and a detailed analysis of the requests directed to the English edition of Wikipedia. In our study, we classify client requests and examine aspects such as the number of read and save operations, flash crowds, and requests for nonexisting pages. We also out...
متن کاملCorrigendum to "Wikipedia workload analysis for decentralized hosting" [Computer Networks 53 (11) (2009) 1830-1845]
The authors would like to point out an error in article [1]. The paper states in Section 3 that there were 20.6 billion requests in the studied period, but the correct number is 25.6 billion requests. The difference corresponds to media file (image) requests that were not considered originally. This results in some values reported in Tables 1 and 2 and parts of the text being incorrect. Also no...
متن کاملQuantifying the Relationship between Hit Count Estimates and Wikipedia Article Traffic
This paper analyzes the relationship between search engine hit counts and Wikipedia article views by evaluating the cross correlation between them. We observe the hit count estimates of three popular search engines over a month and compare them with the Wikipedia page views. The strongest cross correlations are recorded with their delays in days. We present the results in both graphs and quanti...
متن کاملScalable Web Hosting Service
Web hosting is an infrastructure service that allows to design, integrate, operate and maintain all of the infrastructure components required to run web-based applications. It includes Web server farms, network access, data staging tools and security rewalls. Web server farms are used in a Web hosting infrastructure as a way to create scalable and highly available solutions. One of the main pro...
متن کاملDBkWik: Towards Knowledge Graph Creation from Thousands of Wikis
Popular public knowledge graphs like DBpedia or YAGO are created from Wikipedia as a source, and thus limited to the information contained therein. At the same time, Wikifarms like Fandom contain Wikis for specific topics, which are often complementary to the information contained in Wikipedia. In this paper, we show how the DBpedia approach can be transferred to Fandom to create DBkWik, a comp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Networks
دوره 53 شماره
صفحات -
تاریخ انتشار 2009